Reducing the timeseries measurements for patients’ blood gas and pressure levels into scalar-valued indicators or quantities that best correlate with outcomes using thresholds in their values will be helpful for several reasons. Firstly, having single values for the gas and pressure levels makes them more amenable to modeling alongside the existing covariates like age, gender, and marshall/GCS scores. Secondly, models using these scalar values are necessary to establish confidence intervals for effects (bayesian credible intervals are possible without reduction to scalar values but that may be difficult to share). Lastly, using the thresholds makes everything easier to explain and visualize.
Example for how thresholds could be used to reduce timeseries measurements to single values:
Given timeseries measurements \(x_1, x_2, x_3, ..., x_n\), compute the percentage of time spent above and below threshold value \(x_t\) as \(\frac{N_1}{N_1 + N_2}\) where \(N_1 = |{x_i;x \lt x_t}|\) and \(N_2 = |{x_i;x \geq x_t}|\)
Logistic regression with special function for taking in timeseries measurements and then estimating parameters for that function that map to gas/pressure thresholds (Stan Sampling Model):
\[ logit(Pr(y_i = 1)) = \alpha + \beta \cdot X_i + f(G_{ij}) \]
where
\[ X_i = [{Gender}_i, {Age}_i, {CommaScore}_i, {MarshallScore}_i], \] \[ y_i = \{ 0 \text{ if }{GOS}_i \in [1, 2, 3], 1 \text{ if }{GOS}_i \in [4, 5] \} \]
and
\[ f(G_i) = \frac{1}{n_i} \sum_j^{n_i}{ \frac{c_1}{1 + e^{-c_2(G_{ij} - c_3)}} + \frac{c_4}{1 + e^{-c_5(G_{ij} - c_6)}} } \] \[ n_i = \text{ length of timeseries for patient }i \]
The following models were fit using only one gas/pressure measurment:
The following models were fit using one gas/pressure measurment and all covariates (age, sex, gcs, marshall):
The following models were fit using one gas/pressure measurment and all covariates (age, sex, gcs, marshall):
Based on the inferred thresholds and their proximity to some known limits, the following values were used for each quantity as the low, high, and safe ranges:
| Quantity | Low.Range | Safe.Range | High.Range |
|---|---|---|---|
| PaCO2 | [0, 28) | [28, 42] | 42+ |
| PaO2 | [0, 300) | [300, 875) | 875+ |
| PbtO2 | [0, 20) | [20, 70) | 70+ |
| ICP | NA | (-Inf, 20) | 20+ |
Exhaustive model selection via AICc with model space given by:
gos ~ age + sex + marshall + gcs + icp1_20_inf + pao2_0_300 + pao2_875_inf + pbto2_0_20 + pbto2_70_inf + paco2_0_28 + paco2_42_inf
Note that only main effects in a linear model were considered because models with nonlinearities and interactions showed no major improvements. Also, the gos outcome was binarized into two classes, one for bad outcomes (original gos in [1,2,3]) and one for good outcomes (original gos in [4,5]).
Modeling call:
glmulti(f.glmulti, data=d.glmulti, family='binomial', level=1, crit = AICc)
Best model found after exhaustive search (note that PaO2 is absent):
[1] “Number of observations = 166”
| Estimate | Std. Error | z value | Pr(>|z|) | |
|---|---|---|---|---|
| (Intercept) | -1.4103 | 0.2383 | -5.9189 | 0 *** |
| age | -0.5439 | 0.2408 | -2.2585 | 0.02 * |
| marshall | -0.3604 | 0.2441 | -1.476 | 0.14 |
| gcs | 0.4153 | 0.226 | 1.8372 | 0.07 . |
| icp1_20_inf | -0.7185 | 0.4051 | -1.7734 | 0.08 . |
| pbto2_0_20 | -0.4087 | 0.2239 | -1.8252 | 0.07 . |
| paco2_42_inf | 0.5408 | 0.1987 | 2.7221 | 0.01 ** |
While a model that includes all predictors (ICP, PaO2, PaCO2, and PbtO2) shows weak significances for each, simpler models including smaller groups of predictors show stronger relationships (though still fairly weak overall).
For example, this model more directly answers the question “What is the impact of hypoxic PbtO2 burden while controlling for known predictors of outcome?”:
Formula = gos ~ age + sex + marshall + gcs + icp1_20_inf + pbto2_0_20 + pbto2_70_inf
N = 166
| Estimate | Std. Error | z value | Pr(>|z|) | |
|---|---|---|---|---|
| (Intercept) | -1.412 | 0.2412 | -5.8553 | 0 *** |
| age | -0.642 | 0.2384 | -2.6932 | 0.01 ** |
| sex | 0.2073 | 0.2075 | 0.999 | 0.32 |
| marshall | -0.2725 | 0.2442 | -1.1158 | 0.26 |
| gcs | 0.3287 | 0.226 | 1.4545 | 0.15 |
| icp1_20_inf | -0.889 | 0.417 | -2.132 | 0.03 * |
| pbto2_0_20 | -0.4682 | 0.2298 | -2.0372 | 0.04 * |
| pbto2_70_inf | -0.3266 | 0.2934 | -1.1129 | 0.27 |
Now even in the presence of ICP, PbtO2 is still significant at a 95% level (vs a p-value of .07 in the best model)
To answer the question “Is PbtO2 better than PaO2?”, a model like that above that also includes PaO2 could be considered:
Formula = gos ~ age + sex + marshall + gcs + icp1_20_inf + pao2_0_300 + pao2_875_inf + pbto2_0_20 + pbto2_70_inf
N = 166
| Estimate | Std. Error | z value | Pr(>|z|) | |
|---|---|---|---|---|
| (Intercept) | -1.4675 | 0.2529 | -5.8032 | 0 *** |
| age | -0.6768 | 0.2437 | -2.7771 | 0.01 ** |
| sex | 0.2267 | 0.2098 | 1.0804 | 0.28 |
| marshall | -0.2803 | 0.2457 | -1.1409 | 0.25 |
| gcs | 0.3184 | 0.2299 | 1.385 | 0.17 |
| icp1_20_inf | -0.8922 | 0.4257 | -2.0961 | 0.04 * |
| pao2_0_300 | -0.03 | 0.2089 | -0.1436 | 0.89 |
| pao2_875_inf | -0.447 | 0.3321 | -1.3458 | 0.18 |
| pbto2_0_20 | -0.5026 | 0.2356 | -2.1331 | 0.03 * |
| pbto2_70_inf | -0.3125 | 0.2939 | -1.063 | 0.29 |
At least on the scale of statistical significance, an argument could be made that PbtO2 is better than PaO2.
Information criteria scores and statistical signifiances of different predictors show fairly weak evidence that PbtO2 is both important and a better predictor than PaO2, even after controlling for known predictors of outcome like ICP burden, Age, Gender, GCS, and Marshall scores.
Despite the above, I would argue that using information criteria and statistical significances is probably not the best way to draw these conclusions (especially given that the evidence is not overwhelming) and that a better approach may be assessing model performance in leave-one-out cross validation. A predictive accuracy measure like that would likely give a better sense of the quality of the different predictors on a more practical scale. To that end, most of what that follows will examine predictive performance instead.
All models below were tested in leave-one-out cross validation for the sake of comparing predictive performance measuremes between them like ROC-AUC:
| model.name | variables |
|---|---|
| icp | icp1_20_inf |
| paco2 | paco2_0_28, paco2_42_inf |
| pao2 | pao2_0_300, pao2_875_inf |
| pbto2 | pbto2_0_20, pbto2_70_inf |
| pao2_pbto2 | pao2_0_300, pao2_875_inf, pbto2_0_20, pbto2_70_inf |
| icp_paco2 | icp1_20_inf, paco2_0_28, paco2_42_inf |
| icp_pao2 | icp1_20_inf, pao2_0_300, pao2_875_inf |
| icp_pbto2 | icp1_20_inf, pbto2_0_20, pbto2_70_inf |
| icp_pao2_pbto2 | icp1_20_inf, pao2_0_300, pao2_875_inf, pbto2_0_20, pbto2_70_inf |
| icp_pao2_pbto2_paco2 | icp1_20_inf, pao2_0_300, pao2_875_inf, pbto2_0_20, pbto2_70_inf, paco2_0_28, paco2_42_inf |
| demo | age, sex |
| wcov_icp | age, sex, marshall, gcs, icp1_20_inf |
| wcov_paco2 | age, sex, marshall, gcs, paco2_0_28, paco2_42_inf |
| wcov_pao2 | age, sex, marshall, gcs, pao2_0_300, pao2_875_inf |
| wcov_pbto2 | age, sex, marshall, gcs, pbto2_0_20, pbto2_70_inf |
| wcov_pao2_pbto2 | age, sex, marshall, gcs, pao2_0_300, pao2_875_inf, pbto2_0_20, pbto2_70_inf |
| wcov_icp_paco2 | age, sex, marshall, gcs, icp1_20_inf, paco2_0_28, paco2_42_inf |
| wcov_icp_pao2 | age, sex, marshall, gcs, icp1_20_inf, pao2_0_300, pao2_875_inf |
| wcov_icp_pbto2 | age, sex, marshall, gcs, icp1_20_inf, pbto2_0_20, pbto2_70_inf |
| wcov_icp_pao2_pbto2 | age, sex, marshall, gcs, icp1_20_inf, pao2_0_300, pao2_875_inf, pbto2_0_20, pbto2_70_inf |
| wcov_icp_pao2_pbto2_paco2 | age, sex, marshall, gcs, icp1_20_inf, pao2_0_300, pao2_875_inf, pbto2_0_20, pbto2_70_inf, paco2_0_28, paco2_42_inf |
| wcov_none | age, sex, marshall, gcs |
| model | auc | tp | fp | tn | fn | n | |
|---|---|---|---|---|---|---|---|
| 21 | wcov_icp_pao2_pbto2_paco2 | 0.6940821 | 10 | 13 | 110 | 33 | 166 |
| 17 | wcov_icp_paco2 | 0.6908678 | 8 | 11 | 112 | 35 | 166 |
| 19 | wcov_icp_pbto2 | 0.6850066 | 7 | 9 | 114 | 36 | 166 |
| 20 | wcov_icp_pao2_pbto2 | 0.6812252 | 8 | 10 | 113 | 35 | 166 |
| 12 | wcov_icp | 0.6674230 | 7 | 6 | 117 | 36 | 166 |
| 18 | wcov_icp_pao2 | 0.6611836 | 6 | 9 | 114 | 37 | 166 |
| 13 | wcov_paco2 | 0.6604273 | 7 | 8 | 115 | 36 | 166 |
| 15 | wcov_pbto2 | 0.6555114 | 7 | 6 | 117 | 36 | 166 |
| 16 | wcov_pao2_pbto2 | 0.6521081 | 7 | 11 | 112 | 36 | 166 |
| 14 | wcov_pao2 | 0.6339573 | 2 | 3 | 120 | 41 | 166 |
| 22 | wcov_none | 0.6316884 | 1 | 2 | 121 | 42 | 166 |
| 10 | icp_pao2_pbto2_paco2 | 0.6211004 | 5 | 5 | 118 | 38 | 166 |
| 8 | icp_pbto2 | 0.6152392 | 0 | 0 | 123 | 43 | 166 |
| 9 | icp_pao2_pbto2 | 0.5889582 | 0 | 0 | 123 | 43 | 166 |
| 4 | pbto2 | 0.5825298 | 0 | 0 | 123 | 43 | 166 |
| 6 | icp_paco2 | 0.5793156 | 4 | 2 | 121 | 39 | 166 |
| 11 | demo | 0.5764795 | 0 | 0 | 123 | 43 | 166 |
| 5 | pao2_pbto2 | 0.5617319 | 0 | 0 | 123 | 43 | 166 |
| 2 | paco2 | 0.5585177 | 4 | 2 | 121 | 39 | 166 |
| 7 | icp_pao2 | 0.4794857 | 0 | 0 | 123 | 43 | 166 |
| 3 | pao2 | 0.3836264 | 0 | 0 | 123 | 43 | 166 |
| 1 | icp | 0.3433541 | 0 | 0 | 123 | 43 | 166 |
Going beyond GLM models to more sophisticated black-box models, AUC numbers don’t change drastically indicating that little is being lost by ignoring interactions and nonlinearities:
| model | GLM | KNN | RFT | GBM |
|---|---|---|---|---|
| wcov_none | 0.6316884 | 0.6470032 | 0.6012479 | 0.6358480 |
| wcov_icp | 0.6674230 | 0.6030441 | 0.6249764 | 0.6407638 |
| wcov_icp_pao2_pbto2 | 0.6812252 | 0.6394403 | 0.6245037 | 0.6942711 |
| wcov_icp_pbto2 | 0.6850066 | 0.6476650 | 0.6518245 | 0.7167707 |
| wcov_icp_pao2_pbto2_paco2 | 0.6940821 | 0.6344299 | 0.6486103 | 0.7080734 |
ROC Curves from logistic regression models [Demo turning models on and off in order]:
ROC Curves from gbm models:
ROC Curves from nearest neighbor models:
Changes in ROC curves and AUC values further indicate that PaO2 is a weaker predictor than PbtO2. However, the effect of all gases on a practical scale is still relatively weak, and even the highest performing classifiers offer a fairly minor lift over those built using only known predictors of outcome like GCS + Marshall scores and Age.
One way to answer the question “Do patients with lower PaO2 show a greater association between PbtO2 and outcome?” would be to look for potential interactions between PaO2 and PbtO2.
One such model like this would be:
gos ~ age + marshall + gcs + sex + pao2_0_300 + pbto2_0_20 + icp1_20_inf + pao2_0_300:pbto2_0_20
| Estimate | Std. Error | z value | Pr(>|z|) | |
|---|---|---|---|---|
| (Intercept) | -1.3827 | 0.2464 | -5.6123 | 0 *** |
| age | -0.5774 | 0.2386 | -2.4196 | 0.02 * |
| marshall | -0.3482 | 0.2411 | -1.444 | 0.15 |
| gcs | 0.3104 | 0.2322 | 1.3369 | 0.18 |
| sex | 0.2075 | 0.2085 | 0.9954 | 0.32 |
| pao2_0_300 | -0.0759 | 0.2275 | -0.3338 | 0.74 |
| pbto2_0_20 | -0.4846 | 0.2429 | -1.9951 | 0.05 * |
| icp1_20_inf | -0.8972 | 0.4254 | -2.109 | 0.03 * |
| pao2_0_300:pbto2_0_20 | -0.4769 | 0.2921 | -1.633 | 0.1 . |
How the effect of PbtO2 on predicted outcome probabilities changes with PaO2 levels:
Looking for similar interactions between PbtO2 and other predictors shows nothing new. For example, computing an exhaustive AIC search based on models that include the following predictors interacted with PbtO2 shows that only the PaO2 interaction is even remotely important:
Variables considered in search:
## [1] "age" "marshall"
## [3] "gcs" "sex"
## [5] "pao2_0_300" "pbto2_0_20"
## [7] "icp1_20_inf" "age.pbto2_0_20"
## [9] "marshall.pbto2_0_20" "gcs.pbto2_0_20"
## [11] "sex.pbto2_0_20" "pao2_0_300.pbto2_0_20"
## [13] "pbto2_0_20.icp1_20_inf" "gos"
Best Model:
| Estimate | Std. Error | z value | Pr(>|z|) | |
|---|---|---|---|---|
| (Intercept) | -1.3419 | 0.2378 | -5.6417 | 0 *** |
| age | -0.6674 | 0.2192 | -3.0441 | 0 *** |
| gcs | 0.4199 | 0.2212 | 1.8984 | 0.06 . |
| pbto2_0_20 | -0.4708 | 0.2398 | -1.9633 | 0.05 * |
| icp1_20_inf | -0.8928 | 0.4181 | -2.1353 | 0.03 * |
| pao2_0_300.pbto2_0_20 | -0.4539 | 0.2689 | -1.688 | 0.09 . |
Another way to look for interactions is to attempt to explain differences in predictive accuracy of a model including PbtO2 vs a model that does not include PbtO2.
For example, this is a linear model that attempts to explain the point-wise log likelihood ratio (in LOO CV) calculated between a model that includes all predictors except PbtO2 and one that is identical but also includes PbtO2:
##
## Call:
## lm(formula = llr ~ ., data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.64438 -0.08060 0.01259 0.11227 0.50599
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.239e-06 1.839e-02 0.001 0.9996
## age 1.951e-02 2.039e-02 0.956 0.3403
## sex 1.888e-03 1.896e-02 0.100 0.9208
## marshall 5.814e-03 2.064e-02 0.282 0.7785
## gcs 1.525e-02 1.943e-02 0.785 0.4337
## icp1_20_inf -2.864e-03 1.929e-02 -0.148 0.8822
## pao2_0_300 3.430e-02 1.883e-02 1.822 0.0703 .
## pao2_875_inf 1.458e-02 1.934e-02 0.754 0.4522
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2369 on 158 degrees of freedom
## Multiple R-squared: 0.03558, Adjusted R-squared: -0.007151
## F-statistic: 0.8326 on 7 and 158 DF, p-value: 0.5617
Only PaO2 is intimated as being connected to the differences, and here is a plot of those differences:
As one more way to verify that no other definitions of sub-groups benefiting from PbtO2 exist, a dimensional scaling algorithm like TSNE could be used to see if inferred clusters have a relationship with predictive differences:
There is some really weak evidence that an interaction exists between PaO2 and PbtO2, implying that PbtO2 is a greater predictor of outcome when a patient experiences lower PaO2 levels.
There is also a good bit of evidence indicating that no other covariates of interest (ICP, Age, GCS, or Marshall) have such an interaction or any kind of relationship indicating clusters of behavior with respect to PbtO2.